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1.  Objective 


In  this  effort,  we  formulated  a  technique  for  automatically  determining  effective  probability  mass 
functions  required  by  Dempster-Shafer  (DS)  automatic  target  detection  (ATD)  algorithms  for 
multi-sensor  data  fusion.  The  initial  step  in  this  process  comprises  defining  an  ATD  algorithm 
performance  measure.  The  second  step  comprises  formulating  the  automatic  probability  mass 
assignments.  In  short,  our  goal  was  to  define  the  distribution  of  probability  mass  assigned  to  the 
DS  “don’t  know”  hypothesis  so  as  to  improve  the  performance  of  the  fused  ATD  algorithm 
relative  to  the  results  obtained  using  standard  Bayesian  approaches. 


2.  Approach 


Since  DS  theory  includes  the  Bayesian  theory  for  statistically  independent  random  variables  as  a 
special  case,  we  have  restricted  our  attention  to  independent  random  data  generated  using 
Bayesian-based  techniques.  This  approach  allowed  us  to  generate  the  large  number  of  data 
samples  required  to  specify  the  desired  probability  mass  functions.  In  addition,  it  enabled  us  to 
verify  that  the  approach  was  suitable  for  a  variety  of  underlying  probability  distributions.  As 
noted  in  the  literature  (7),  the  number  of  unknown  DS  parameters  increases  rapidly  with  the 
number  of  measurement  sensors  and  detection  hypotheses.  Hence,  for  this  investigation,  we 
restricted  our  attention  to  fusion  of  detection  algorithm  outputs  from  two  sensors,  thereby 
keeping  the  number  of  unknown  parameters  manageable.  The  techniques  developed  here, 
however,  are  readily  extensible  to  a  larger  sensor  suite  and  a  larger  number  of  target  classes,  if 
the  increased  computational  requirements  are  tolerable. 

The  DS  mass  function  is  similar  to  the  Bayesian  probability  density  function  (pdf).  It  assigns  a 
number  to  a  measurement  (which  could  include  a  detection  algorithm  output),  expressing  the 
likelihood  that  the  measurement  indicates  a  specific  object,  such  as  clutter  or  target.  In  fact,  we 
calculated  the  mass  DS  mass  functions  from  the  histograms  for  data  samples  from  the  “target” 
and  “clutter”  hypotheses.  Calculating  the  mass  assigned  to  the  “don’t  know”  hypothesis, 
however,  is  not  so  straightforward,  and  we  spent  a  great  deal  of  effort  surmounting  this  problem. 

We  began  our  development  for  this  project  by  considering  the  DS  joint  probability  mass  function 
(pmf)  for  measurements  from  two  independent  sensors.  The  joint  pmf  is  the  DS  analog  of  the 
Bayesian  joint  pdf,  and  it  can  be  expressed  in  terms  of  the  mass  functions  for  the  individual 
sensors  using  the  DS  combination  rule: 


1 


=  0 

=  m, (x, , H0 ) w2 (x2 , H0 )  +  m, (x, , Ho )ffl2 (x2 , H0 U H, )  +  m, (x, , Ho U H, ) w2 (x2 , H0 )  .  =  0 

K  ,  ,  (1) 

w, (x, ,  H, )m2 (x2,H,)  +  ot, (x, ,  H, )m2 (x, ,  H0  U  H, )  +  m, (x, ,  Hfl  U H, )m2 (x2 ,  H, )  . 

= - ,  l  =  1,  A  7=  U 

K 

_m1(x,,HoUH1)m2(x2,H0UH1)  ,  =  2 
AT 


where  mi2(X,H;)  denotes  the  joint  pmf  for  sensor  1  and  sensor  2;  H,  denotes  hypothesis  i  (either 
“target,”  “clutter,”  or  “uncertain”);  X  denotes  the  vector  of  measurement  statistics  (features) 
from  sensor  1  and  sensor  2;  and  H0  U  Hi  denotes  the  region  of  feature  space  assigned  probability 
mass  corresponding  to  “uncertain”  or  “don’t  know.”  K  is  a  normalizing  factor  defined  by 

K  =  mx  (X; ,  H0  )m2  (x2 ,  H0 )  +  mx  (xj ,  H0  )m2  (x2 ,  H0  U  H; )  +  mx  (xj ,  H0  U  H;  )m2  (x2 ,  H0 )  + 

m1(x1,H1)m2(x2,H1)  +  m1(x1,H1)m2(x2,H0  +  UH1)w2(x2,H1)+  .  (2) 
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Our  goal  is  to  define  a  metric  and  a  procedure  that  enable  us  to  optimally  assign  probability  mass 
to  Ho  U  Hi.  One  such  approach  used  in  the  past  measured  the  amount  of  uncertainty  remaining 
after  data  fusion,  declaring  the  pmf  mass  assignments  to  be  good  if  this  residual  uncertainty  is 
reduced  (2),  while  another  approach  combined  a  maximum-likelihood  parameter-estimation 
approach  with  DS  belief  functions  (5).  In  what  follows,  we  eschew  the  statistical  quantities,  such 
as  plausibility  and  belief,  which  are  often  employed  by  practitioners  of  the  DS  theory  ( 4 ). 

Instead,  we  present  a  novel  approach  that  combines  elements  of  DS-theory  with  classical 
detection-theoretic  techniques  commonly  used  within  the  automatic  target  recognition  (ATR) 
community.  That  is,  we  modified  the  DS  joint  mass  function  to  improve  the  performance  of  a 
test  based  on  the  classical  likelihood-ratio  test  (LRT),  and  we  quantified  this  performance  via 
receiver  operator  characteristic  (ROC)  curves.  It  is  this  combination  of  the  classical  LRT,  a  ROC 
performance  metric,  and  the  DS  probabilistic  formulation  that  provides  the  novelty  to  our 
approach. 

In  the  past,  researchers  have  used  the  area  under  the  ROC  curve  to  compare  the  performance  of 
two  competing  algorithms  (5).  We  followed  a  similar  approach  and  defined  a  fitness  measure 
that  expresses  performance  in  terms  of  the  area  under  the  ROC  between  two  pre-determined 
probabilities  of  false  alarm  (Pfas),  defined  as  r(i),  where  i  denotes  the  iteration  number.  Hence, 
we  optimized  the  DS  parameters  for  specific  operating  points  at  the  expense  of  potentially 
degrading  performance  at  other,  less  desirable  operating  points. 
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(3) 


We  calculated  the  ratio-based  test  statistics  according  to 

nhl(X1HJ) 

ma(X,Ha) 


where  mn(X,  H)  is  the  joint  pmf  described  in  equation  1,  H\  denotes  the  “target”  hypothesis, 
and  Ho  denotes  the  “clutter”  or  “non-target”  hypothesis.  Our  likelihood  ratio  test  then  becomes, 
simply 


>  [  Declare  H, 

mu(X,Hx)  kmn(X,H0)  {  1  ,  (4) 

<  [Declare  H0 

where  k  is  a  pre-determined  threshold  selected  to  fix  a  specific  operational  Pfa  or  probability  of 
detection  (Pd).  Note  that  for  the  case  of  zero  uncertainty  m  \ i(X,  Ho  U  H\)  =  0,  and  the  likelihood 
ratio  test  reduces  to  the  classical  Bayesian  likelihood  ratio  test. 

We  recognize  immediately  from  equations  4  and  1  that  we  can  adjust  constituent  parameters 
contributing  to  m n(X,  H,)  and  alter  the  test  statistic  at  selected  operating  Pfas.  Values  of  nii{x\,Hj ) 
for  /=1,2  and 7=0,1  are  dictated  by  the  underlying  marginal  pdfs,  and  we  calculate  them  using 
standard  histogram-based  techniques.  Values  of  miix\,Ho  U  Hi),  however,  provide  us  with  the 
degrees  of  freedom  necessary  to  improve  the  performance  of  the  likelihood  ratio  test.  These 
probability  “masses”  comprise  the  parameters  that  we  optimize  via  dynamic  training  algorithms. 

Many  of  the  algorithms  for  determining  optimal  parameters  rely  upon  recursive  techniques  based 
on  gradient  descent,  and  they  typically  attempt  to  minimize  a  differentiable  cost  function,  such  as 
mean-squared  error.  Since  we  use  the  area  under  the  ROC  curve  as  our  fitness  measure,  our 
metric  calculation  is  nonlinear  and  unsuitable  for  a  recursive,  gradient-based  technique.  As  a 
result,  we  have  identified  and  implemented  an  appropriate  nonlinear  optimization  procedure  that 
is  referred  to  in  the  literature  as  the  “particle  swarm”  algorithm  (6).  As  its  name  suggests,  the 
particle  swarm  algorithm  mimics  the  dynamics  of  swarming  insects.  Its  initial  step  involves  the 
creation  of  N  swarm  members,  each  located  at  some  point  in  the  multi-dimensional  parameter 
space.  In  our  application,  this  multi-dimensional  space  consists  of  the  DS  mass  functions  for  the 
“don’t  know”  hypothesis,  m,i(x\,Ho  U  Hi),  and  examples  of  initial  seeds  for  these  weighting 
functions  are  shown  in  figure  1.  We  refer  to  the  set  of  initial  weighting  functions  as  “basis 
functions,”  because  they  generate  the  final  DS  mass  function.  The  fitness  function — in  our  case, 
a  version  of  the  area  under  the  ROC  curve — is  evaluated  at  each  swarm  member’s  location,  and 
the  maximum  (global)  fitness  value  together  with  its  corresponding  location  are  recorded.  Each 
swarm  member’s  location  is  then  modified  according  to  equation  5: 

Sj{i+ 1  )=  Sj{i)+ju  (g(i)  -  Sj(i)),  (5) 
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where  s/i)  denotes  the  location  of  swarm  member  j  within  the  parameter  space  at  iteration  i;  g(i ) 
denotes  the  “globally  best”  location  found  by  all  swarms  up  to  time  i;  and  0  <  //  <  1  is  a  constant 
governing  the  step  size  from  s/i)  in  the  direction  of  g(i).  As  the  algorithm  progresses,  the 
members  of  the  swarm  wander  through  the  parameter  space  in  the  direction  of  the  ever-changing 
“globally  best”  location.  Whenever  a  new  maximum  is  found,  both  the  new  maximum  and  the 
new  parameter  value  are  recorded,  and  the  corresponding  swarm  member  is  initiated  to  another 
randomly  selected  basis  function.  The  algorithm  terminates  either  after  a  fixed  point  in  the 
parameter  space  has  been  found  or  a  designated  number  of  iterations  have  been  completed.  After 
the  algorithm  has  converged,  we  are  left  with  a  final  version  of  the  mass  function  as  illustrated 
in  figure  2. 


Figure  1.  Example  of  members  of  a  swarm  at  an  initial 

iteration  for  rti\(x\,  H0  U  Hi).  The  feature  value  is  a 
normalized  output  from  a  notional  ATD  algorithm. 
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Figure  2.  Example  of  final  mass  functions  after  the  particle 
swarm  algorithm  has  converged;  here,  i  =1,2. 


4 


Both  the  number  and  initial  disposition  of  swarm  members  play  a  critical  role,  since  they 
determine  how  the  search  will  proceed.  We  specify  300  randomly  selected  initial  locations  in  the 
parameter  space  so  as  to  bracket  anticipated  values  of  the  unknown  mass  functions,  and  we 
specify  a  scalar  value  of  ju  that  produces  a  step  size  judged  to  be  adequate  but  not  too  large  (on 
the  order  of  ||g(z)  -  s,(/)||/10).  The  300  functions  comprise  different  triangular  waveforms  similar 
to  those  depicted  in  figure  1 ,  and  we  also  vary  the  magnitude  of  mass  functions  from  sensor  1 
relative  to  those  from  sensor  2.  That  is,  we  include  three  separate  cases:  one  in  which  the  mass 
values  for  sensor  1  are  smaller  than  the  corresponding  mass  values  for  sensor  2,  a  second  in 
which  the  mass  values  are  about  the  same  size,  and  a  third  in  which  the  mass  values  for  sensor  1 
are  larger  than  the  corresponding  mass  values  for  sensor  2.  Once  again,  this  is  done  in  an  attempt 
to  bracket  the  region  of  the  parameter  space  containing  the  solution.  The  entire  particle  swarm 
procedure  is  outlined  in  the  block  diagram  in  figure  3. 


Figure  3.  Block  diagram  of  a  particle  swarm. 
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To  evaluate  our  proposed  technique,  we  simulated  multiple  sensing  scenarios,  varying  the 
means,  standard  deviations,  and  correlations  of  the  two-dimensional  Gaussian-distributed  feature 
vectors  representing  measurements  from  two  independent  sensors.  The  106  samples  representing 
clutter  measurements  from  one  of  the  sensors  were  independent  and  identically  distributed  (iid), 
and  they  may  or  may  not  have  been  correlated  with  the  106  samples  representing  clutter 
measurements  from  the  other  sensor.  The  105  samples  representing  target  measurements  from 
one  of  the  sensors  were  iid  and  were  independent  of  the  clutter  samples.  For  some  of  the 
scenarios  they  were,  however,  correlated  with  the  105  samples  representing  measurements  from 
the  second  sensor.  Clutter  plots  of  simulated  measurements  for  representative  scenarios  are 
shown  in  figure  4. 
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Figure  4.  Scatter  plots  of  sample  scatter  plots  of  sample  clutter 

(red)  and  target  (blue)  data.  The  values  for  sensor  1  and 
sensor  2  represent  outputs  of  notional  ATD  algorithm. 
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We  calculated  the  mass  function  for  a  particular  sensor  from  the  histogram  of  relevant  data 
samples  in  figure  4,  and  examples  of  these  mass  functions  are  shown  in  figure  5.  Notice  that  the 
amount  of  correlation  varies  slightly  between  the  two  data  sets.  We  defined  a  region  of  support 
for  basis  functions,  such  as  those  shown  in  figure  1,  by  examining  the  sensor  measurements  for 
which  target  and  clutter  samples  have  similar  mass  values  in  the  plots  of  figure  5.  For  our 
investigations,  we  assigned  the  non-zero  mass  for  Ho  U  Hi)  (the  uncertainty  region)  to 
sensor  measurements  with  corresponding  mass  assignments  in  the  interval  [0.2,0. 8] — a  selection 
based  on  results  of  a  brief  parametric  study  conducted  using  one  of  the  preliminary  data  sets. 
Note  that  we  fixed  the  parameter  k  in  equation  4  at  1 .0  in  order  expedite  the  training  process,  but 
in  the  most  general  formulation  it  too  would  be  determined  by  the  training  procedure. 
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3.  Results 


We  evaluated  detection  algorithm  performance  using  data  distributed  as  illustrated  in  figure  4. 
Our  evaluation  metric  was  the  ROC  curve,  and  we  restricted  our  attention  to  the  specific  Pfa 
interval  defined  as  part  of  the  training  procedure.  Since  our  Pfas  of  interest  were  low,  we  used 
4  x  106  clutter  samples  to  ensure  that  we  have  a  significant  number  of  samples  available  for 
estimating  these  probabilities.  Results  are  included  in  figure  6,  along  with  the  underlying 
detection  algorithm  and  data  generation  parameters.  Figure  7  shows  the  resulting  uncertainty 
functions  together  with  the  probability  mass  functions  for  sensor  1  and  sensor  2.  We  do  not 
include  the  analyses  for  uncorrelated  data,  since  these  results  did  not  indicate  a  significant 
difference  between  the  Bayesian  and  the  DS  approaches.  For  uncorrelated  data,  the  particle 
swarm  converged  to  uncertainty  functions  that  were  zero  or  nearly  zero  everywhere. 


(a)  Parameter  set  1.  a i =0.5,  a2=0.13, 
c>i2=0.15  for  clutter  <7i2=0.2 
G!=0.75,  a2=0.42,  c>i2=0.23  for  target 
clutter  mean=[0.0,  0.38],  target  mean  =  [1.25,2.62] 


(b)  Parameter  set  2.  gi=0.5,  c>2=0.13, 
<7i=0.75,  a2=0.42,  gi2=0.3 
clutter  mean=[0.0,  0.38],  target  mean  = 
[1.25,2.62] 


Figure  6.  ROC  curves  generated  for  two  different  sets  of  algorithm  and  data  parameter  values;  the  data  sets 
were  used  to  generate  plots  in  figures  4  and  5,  and  all  data  follow  normal  distribution. 
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(a)  DS  mass  functions  for  sensor  1,  Parameter  set  1.  (b)  DS  mass  functions  for  sensor  2,  Parameter  set  1. 


(c)  DS  mass  functions  for  sensor  1,  Parameter  set  2.  (d)  DS  mass  functions  for  sensor  2,  Parameter  set  2. 


Figure  7.  DS  mass  functions  (including  uncertainty  function,  in  green)  found  by  the  particle  swarm  for 
parameter  sets  shown  in  figure  6. 


4.  Conclusions 


The  simulation  results  indicate  that  it  is  possible  to  enhance  performance  of  the  ratio-based 
detection  algorithm  over  a  designated  Pfa  interval;  although  this  gain  could  result  in  decreased 
performance  over  a  range  of  Pfas  that  are  not  of  interest.  The  expected  improvements  in 
performance  appear  to  be  insignificant  when  the  assumption  of  independence  between  the  two 
sensors  is  satisfied,  but  this  is  not  entirely  surprising,  since  the  classic  LRT  was  derived  under 
this  assumption  of  independence.  When  correlation  between  the  sensor  measurements  is 
introduced,  however,  the  improvement  obtained  via  DS  can  be  significant.  This  improvement  is 
due  to  the  additional  degrees  of  freedom  available  from  the  inclusion  of  the  “don’t  know” 
category  (for  individual  sensors)  into  the  calculation  of  the  DS  joint  mass  function.  While  data- 
intensive,  the  DS  method  provides  the  possibility  of  enhancing  detection  algorithm  performance 
at  critical  values  of  Pfa,  and  our  procedure,  combining  concepts  from  DS  and  classical  detection 
theory,  exploits  this  additional  flexibility.  With  an  adequate  amount  of  training  data,  the  DS- 
based  approach  could  become  an  attractive  option. 
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List  of  Symbols,  Abbreviations,  and  Acronyms 


ATD  automatic  target  detection 

ATR  automatic  target  recognition 

DS  Dempster-Shafer 

iid  independent  and  identically  distributed 

LRT  likelihood-ratio  test 

Pd  probability  of  detection 

pdf  probability  density  function 

Pfa  probability  of  false  alarm 

pmf  probability  mass  function 

ROC  receiver  operator  characteristic 
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